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Adaptive feedback schemes are promising for quantum-enhanced measurements yet are complicated 
to design. Machine learning can autonomously generate algorithms in a classical setting. Here we 
adapt machine learning for quantum information and use our framework to generate autonomous 
adaptive feedback schemes for quantum measurement. In particular our approach replaces guesswork 
in quantum measurement by a logical, fully-automatic, programmable routine. We show that our 
method yields schemes that outperform the best known adaptive scheme for interferometric phase 
estimation. 



In classical physics, it is assumed that detectors and con- 
trols can be arbitrarily accurate, restricted only by tech- 
nical limitations. However, this paradigm is valid only 
on a scale where quantum effects can be ignored. The 
'standard quantum limit' (SQL) [1] restricts achievable 
precision, beyond which measurement must be treated 
on a quantum level. Heisenberg's uncertainty principle 
provides a much lower but insurmountable bound for the 
accuracy of measurement and feedback. Approaching the 
Heisenberg limit is an important goal of quantum meas- 
urement. 

The problem of quantum measurement can be stated as 
follows. A quantity such as spatial displacement, energy 
fluctuation, phase shift, or combination thereof, must be 
measured precisely within a specific duration of time. A 
typical device has an input and output, and the relation 
between the input and output yields information from 
which the quantity of interest can be inferred. 

Important examples of practical quantum measure- 
ment problems within limited time include atomic clocks 
[2] and gravitational- wave detection [3] . Extensive efforts 
are underway to detect gravitational waves with laser- 
interferometers. The precision of these detectors is ulti- 
mately limited by the number of photons available to the 
interferometer within the duration of the gravitational- 
wave pulse [4]. The SQL to measurement is a concern for 
opening up a new field of gravitational-wave astronomy 
[5]. 

For the typical two-channel interferometer, shown in 
Fig. 1, the goal is to estimate the relative phase shift ip 
between the two channels. The interferometer has two 
input ports and two output ports, and we consider each 
input and output field as being a single mode. 

Each input photon to the interferometer provides a 
single quantum bit, or 'qubit', as the photon is a super- 
position of |0), which represents the photon proceeding 
down one channel, or |1), corresponding to traversing the 
other channel. Each photon is either detected as leaving 
one of the two output ports or is lost. Thus, quantum 
measurement can extract no more than one bit of inform- 
ation about (f per qubit in the input state [6, 7]. 

The fundamental precision bound is given by the 'Heis- 
enberg limit': the standard deviation Acp of the phase 



estimate scales as 1/ N for N the number of input qubits 
used for the measurement. Aip is determined by the error 
probability distribution P(c) for estimating ip with error 
As $ is cyclic over 27r, Aip is related to the Holevo 
variance V by [8] 



V = (A(^) 2 = S~ 2 - 1 , 



S 



• (i) 



S is the 'sharpness' [9] of P(s). In contrast, classical 
measurements only manage to achieve the SQL scaling 
A(p ~ 1/y/N due to partition noise for photons passing 
through the beam splitter. Quantum alternatives such 
as injecting squeezed light into one port of the interfero- 
meter can partially evade partition noise [10]. 

Since for any time-limited interferometric measure- 




Figure 1. Adaptive feedback scheme for interferomet- 
ric phase estimation: Mach-Zehnder Interferometer with 
an unknown phase difference ip between the two arms and an 
additional controllable phase shifter <I>. The input state \^n) 
is stored in a quantum memory and one qubit at a time is 
transformed into a photonic qubit and sent through the in- 
terferometer. The processing unit (PU) sets the value of the 
phase shifter $ depending on the measurement outcome of the 
single photon detectors Co and c±. Adaptive feedback at step 
m = 2 is depicted. That is, two of the N input photons (the 
lowest two circles) have been sent through the interferometer 
and measured in previous steps. 
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merit, the number of input-qubits N determines the 
achievable precision, we define N as the relevant cost for 
the measurement. However, it is important to discrimin- 
ate resources required to operate a measurement device 
from the ones used to develop it. Accordingly we distin- 
guish between operational and developmental cost. The 
strategic question concerns the design of a device with a 
certain operational cost, so that its precision surpasses 
the SQL and scales as close to the Heisenberg limit as 
possible. 

Quantum measurement schemes employing adaptive 
feedback are most effective, since accumulated inform- 
ation from measurements is exploited to maximize the 
information gain in subsequent measurements. Such ad- 
aptive measurements have been experimentally shown 
to be a powerful technique to achieve precision beyond 
the standard quantum limit [11, 12]. However, devising 
'policies', which determine feedback actions, is generally 
challenging and typically involves guesswork. Our aim is 
to deliver a method for an automated design of policies 
based on machine learning [13]. To show the power of our 
framework, we apply it to adaptive phase estimation. As 
we will show, the policies generated by our method out- 
perform the best known solutions for this problem. 

Fig. 1 shows how a two-channel quantum interferomet- 
ric measurement with feedback operates. We inject a 
TV-photon input state \^n) into a Mach-Zehnder inter- 
ferometer with an unknown phase shift ip in one arm and 
a controllable phase shift in the other arm. Detectors 
at the two output ports measure which way the photon 
left, and this information is transmitted to a processing 
unit (PU), which determines how $ should be adjusted 
for the next input-qubit. We show that, after all N in- 
put quits have been sent through the interferometer, ip 
can be inferred with a precision that scales closely to the 
Heisenberg limit. 

We use the input \V N ) = Yln,k=o c n,k K N — n) from 
[14], with cn tk = (f + sin(-^Tr) ei« k ~ n ) <( 2 (f ) 
and dl^(fi) Wigner's (small) d- matrix [15]. \n,N — n) de- 
notes a symmetrized state of N suitable delayed photons 
with n photons in channel |0) and N — n in |1) [16]. 

The challenge is to find a feedback policy, i.e. algorithm 
to run in the PU, that adjusts optimally. Fortu- 
nately, the area of machine learning suggests a prom- 
ising approach. However, standard machine learning 
assumes classical bits as input and output. We inject 
a sequence of entangled qubits and obtain output bits. 
Due to the entanglement, the state of the remaining in- 
put qubits is progressively updated by the measurement. 
Consequently, the input to the system (except the first 
qubit) depends on the unknown system parameters. As 
a result, the space of quantum measurement policies is 
generically non-convex, which makes policies hard to op- 
timize. 

Particle swarm optimization (PSO) algorithms [17] are 




Figure 2. Decision tree representations of two ad- 
aptive feedback policies for N— 6 photons. The PSO- 
generated policy is graphed in blue solid lines, the BWB- 
policy in gray dotted lines. All 2^ possible experimental runs 
are represented by paths in the tree. The path correspond- 
ing to an experiment with detections u\U2...uq — 100000 
is marked by ♦, the path corresponding to the detections 
u\U2 . . .uq — 011010 is highlighted by I. For each path in 
the tree, the inner nodes represent the applied feedback phases 
<E> m and the leaf shows the final phase estimate <p. 

remarkably successful for solving non-convex problems. 
PSO is a 'collective intelligence' strategy from the field 
of machine learning that learns via trial and error and 
performs as well as or better than simulated annealing 
and genetic algorithms [18-20]. Here we show that PSO 
algorithms also deliver automated approaches to devising 
successful quantum measurement policies for implement- 
ation in the PU. 

Our method is effective even if the quantum system is a 
black box, i.e. complete ignorance about the system itself. 
The only prerequisite is a comparison criterion during the 
training phase by which the success of candidate policies 
can be evaluated. 

To explain how we use machine learning for the 
quantum measurement problem, consider the decision 
tree required by the PU to update the feedback <£. The 
measurement of the i th qubit yields one bit Ui of inform- 
ation about which way the photon exited. (If the photon 
is lost, there is no detection at all and hence no bit. 
Therefore, a policy must be robust against loss.) After 
m photons have been processed, the PU stores the m-bit 
string n m = (u m u m -\ . . . u±) and computes the feedback 
phase <I> m . In the most general case of a uniform prior 
distribution for cp G [0,27r), there is no optimal setting 
for the initial feedback so we set <3>o = 0, without loss of 
generality. All subsequent <£ m are chosen according to a 
prescribed decision tree. 

In order to show that our method not only works, 
but is superior to existing feedback-based quantum meas- 
urements, we choose the Berry- Wiseman-Breslin (BWB) 
policy [14] as a benchmark. The BWB-policy is the most 
precise policy known to date for interferometric phase es- 
timation with direct measurement of the interferometer 
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output. Furthermore, its practicality has been demon- 
strated in a recent experiment [11]. The BWB-policy 
achieves its best performance with the input state |^at). 
We use the same input state to provide fair premises. 
However, any more practical input state can be used and 
the PSO will autonomously learn good feedback policies. 

In Fig. 2 we depict the decision trees of the BWB-policy 
and of our six-photon policy. At depth m, a measurement 
w m+ i = directs the path to the left and ?x m +i = 1 to the 
right. The final destination of the path yields an estimate 
p> of (p, which is solely determined by the measurement 
record njy. Each experimental course corresponds to a 
path in the decision tree, where a path is a string of ap- 
plied feedback phases 3>o 5 $i( n i)i • • • ? ^iv-i(^jv-i) plus 
a final phase estimate ^(njv)- 

A policy is entirely characterized by all the actions it 
can possibly take, thus by the 2^ 7V+1 ^ ) — 1 phase values 
*o,*i(0), $i(l), $2(00), $2(01),... e [0,2tt). There- 
fore, a policy can be parametrized as a vector p in the 
policy space [0, 27r) 2(Ar+1)_1 , and any such vector p forms 
a valid policy. 

For addition and scalar multiplication modulo 27r, the 
policy space forms a vector space. However, the dimen- 
sion 2^ Ar+1 ) — 1 of this space grows exponentially with TV 
making numeric optimization computationally intract- 
able. Hence, we have to decrease the dimension of the 
search space exponentially by excluding policies. 

In the case of logarithmic search, the adjustments of 
the feedback phase, A^> m := |3> m — 3> m _i|, follow the 
recursive relation A<£ m = ~A$ m _i. Here, we generalize 
this search approach and treat A$i, . . . , A&n as inde- 
pendent variables. In the emerging trees, the adjustment 
A<£ m depends only on the depth m, i.e. the number of 
measurements performed, but not on the full measure- 
ment history n m : 

$m = $m-l-(-l) n ™A$ m . (2) 

Equivalently, the final phase estimate is determined via 

<p = * N - 1 -{-l)"»A<p. (3) 

By this parametrization, (A3>i, . . . , A^at-i, Ap) fully 
define a decision tree, because the initial feedback phase 
is set to $0 = 0. The dimension of the resulting policy 
space V = [0,271-)^ is linear in TV. 

Furthermore our dimensional reduction is promising 
because V includes a good approximation of the BWB- 
policy. Therefore, the best policies of this class will pre- 
sumably outperform the BWB-policy. 

Now that the policy space is appropriately small, 
we employ a PSO algorithm. This population-based 
stochastic optimization algorithm is inspired by social be- 
havior of birds flocking or fish schooling to locate feeding 
sites [21]. Instead of birds and flocks, we employ the 
standard terms particles and swarms. 
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Figure 3. (a) Holevo phase variance of the PSO- 
optimized policies in comparison to the BWB-policy and 
globally optimal policies for varying operational cost TV. The 
blue shaded area shows the domain of quantum enhanced 
measurements, (b) Performance of the PSO policies with 
probability of photon loss 77. All curves follow a power law 
for TV > 7 indicated by solid lines. 

To search for the optimal phase estimation policy, 
the PSO algorithm models a swarm of particles mov- 
ing in the search space V. The position = 
(A3>i, . . . , A<I>jv-i5 Acp) G V of particle i represents 
a candidate policy for estimating cp, which is initially 
chosen randomly. Given the policy pW, the sharpness 
S(p^) is analytically computed and disclosed to the 
particle. 

The PSO algorithm updates the candidate policies of 
all particles, i.e. the positions in the policy space V, in 
sequential rounds. At every time step, each particle dis- 
plays the sharpest policy #W ^ j> it has found so far 
to the rest of the swarm. Then all particles try other 
policies by moving in the policy space V. The moving 
direction for each particle is based on its own experience 
and also on what other particles in its neighborhood have 
discovered is the best overall policy. 

The computation of the sharpness S(p) has exponen- 
tial time complexity in TV. Consequently, policies can be 
optimized only for small TV. In practice the values of TV 
achieved in experiments are quite small, much less than 
14. So small TV simulations are of practical value. 

We have trained the quantum learning algorithm for 
phase estimation up to a total photon number of TV = 
14. In each case, the PSO algorithm tries to find the 
sharpest policy (A^>i, . . . , A$jv-i, Ap). However, as the 
algorithm involves stochastic optimization, it is not guar- 
anteed to learn the optimal policy every time. So it must 
be run several times independently for each TV. Rerun- 
ning the PSO-algorithm increases the developmental cost 
for the policies but does not affect their operational cost. 

Fig. 3(a) depicts the performance of our quantum 
learning algorithm and compares it to the BWB-policy. 
Within the limits of the available computational re- 
sources, the PSO policies outperform the BWB-policy. 
To provide a quantitative estimate of the performance 
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difference, we calculated the scaling of the Holevo phase 
variance for N > 7, where both curves follow a clear 
power law (solid lines). Our policy yields oc N~ a with 
a scaling of apso = —1.472 ± 0.005, compared to BWB's 
«bwb = -1.408 ±0.005. 

Any practical policy has to be robust to photon loss. 
In Fig. 3(b), we have graphed the performance of our 
policies for loss rates 77 up to 40% and calculated the 
scaling a v for N > 7. We found a .i = 1-421 ± 0.006, 
a . 2 = 1.377 ± 0.008, and a 0A = 1-307 ± 0.009. This 
shows that our PSO-generated policies, which are op- 
timized for a loss-less interferometer, are robust against 
moderate loss (which is also true for the BWB-policy). 
Moreover, one could train the PSO algorithm for a fixed 
loss rate 77, which increases the computation time for the 
sharpness evaluation by a factor of N. 

The dimensional reduction of the search space comes at 
the price of possibly excluding superior policies. In addi- 
tion to proposing the BWB-policy, the authors performed 
in [14] a brute force search for 'globally optimal policies' 
in the exponential space. This was done by approxim- 
ating [0, 2tt) with a mesh and evaluating every possible 
combination of feedback phases. The performance of the 
optimal policies of this search is shown in Fig. 3(a). We 
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APPENDIX 



A. Interferometer Description 



We use the convention 



|0) = fit | vac) , |1) = 6t|vac) . 



(4) 



where a) and ¥ are the creation operators for the field 
modes a and 6, and |vac) denotes the vacuum. We 
consider a Mach-Zehnder interferometer, where the first 
50:50 beam splitter combining the two inputs has a scat- 
tering matrix 



Bi = 



V2V 1 



(5) 



The second beam splitter B^ is chosen such that it re- 
covers the input if the phases $ and ip of both arms are 
equal, i.e. B2 = B^ 1 . The operator of the Mach-Zehnder 
interferometer is given by [23] 



1(0) = exp 



-6(a) b- a P) 



(6) 



with 0= ±((p-$). 



B. Input States 

For single-shot interferometric phase estimation, so called 
'minimum uncertainty states' have been proposed to re- 
duce the Holevo variance of the estimates of ip [14, 24]. 
These states are symmetric with respect to permutations 
of qubits and therefore the relevant quantities are the 
number n a and n& of photons in mode a and b. In this 
case, the product of two Fock states for the two modes a 
and 6, denoted |n a , rib) with N = n a + rib is a convenient 
basis. 

The minimum uncertainty state \^n) with N qubits is 
given by 



l*AT> =(f + 1 
N 



E sin (]l+H 



e f7r(fe- 



■ n) 



(7) 



n,k=0 



where a% (f3) is Wigner's (small) <i-matrix [15]. 



The minimum uncertainty state has been found to be 
the optimal input state for single-shot adaptive interfer- 
ometric phase estimation [14], but, due to its entangle- 
ment, it is naturally hard to prepare. The BWB-policy 
achieves its best performance under the use of the input 
state (7), we use the same input state to provide fair 
premises. As for the BWB-policy, any other, more prac- 
tical, input state can be used and the PSO will autonom- 
ously learn a good adaptive strategy. 



C. Feedback Technique 

The value of changes with the progress of the exper- 
iment due to the varying feedback <£. In our notation, 
<£ m is the feedback phase applied after the m th detec- 
tion. Hence, at the time when the m th particle of the 
input state passes the interferometer, the phase differ- 
ence between the two arms is parametrized by 



1 



(<p-$ m _i). 



(8) 



The remaining input state |^;(n m ,y?)) after m photo- 
detections is given by 

\ip(n m ,ip)) = c Um (0 m ) ■ ••c U2 (0 2 )c Ul (0 1 ) \V N ) , (9) 

where 

A acos(0 k - u k \ ) - bsm(0 k - u k \ ) 

c - ('*) = ^N-k+l (10) 

is the Kraus operator [25] representing a measurement 
of the k th particle with outcome u k [14]. The states (9) 
are not normalized. In fact, their norm represents the 
probability 

P(n m \(p) = (^(n m ,(p)\^(n m ,(p)) . (11) 
for obtaining the measurement record n m given (p. 

D. Performance Measure For Policies 

In this section, we will show how the sharpness (1) can 
be analytically computed for a given policy p. Our de- 
rivation follows the procedure in [14]. The sharpness is 
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determined by the probability that p produces an estim- 
ate (f p with error $ — (p p — <p, 

P P k\<P) = E P pMv) S k- (<p p (n N ) - ip)) . (12) 
n N e 

{0,1}^ 

Here P p (tin\^p) is the probability that the experiment, 
with feedback actions determined by p, produces the 
measurement string un given the phase value <p. Here 
we use a flat prior for ip, i.e. P(<p) = 



P„(?) = J P(tp)P p (<;\ip)d<p 





(13) 



{0,1} N 



From this probability distribution, we determine the 
sharpness with equation (1) 



S(p) = 



zir 

m « T /C * 7 



nArG 
{0,1}' 



-E 

{0,1}^ 



£ ip p (n N ) 



(14) 



The probability P p (n/v|<0 is given by equation (11) and 
can be directly computed for a given policy p. (For more 
details see [14].) From equation (14) it is obvious that 
computing the sharpness of a policy p has complexity 
0(2 N ). This is because the summand has to be evaluated 
for every bit-string tin of length N. 



E. Optimization Problem 

Given the policy space V, the optimization problem is 
defined as finding a policy 



G argmaxS'(p) , 



(15) 



i.e. find a p m ax such that S(/) max ) > S(p) for all p G V. 



F. Details of the employed PSO algorithm 

In this section the details of the PSO algorithm we em- 
ployed are presented. The swarm S = {pi,P2> • • • >Pe} is 
composed of a set of particles i = 1, 2, . . . , 5, where pi 
is the set of properties of the i th particle and £ £ IN" is 
the population size. At any time step t, pi includes the 
position pW (= j> Q f particle i and p^\ which is the best 
position i has visited until time step t. 



N H 

4 50 

5 50 

6 50 

7 50 

8 60 

9 60 

10 60 

11 60 

12 120 

13 375 

14 441 



700 0.5 

700 0.5 

700 0.5 

500 0.5 

300 0.5 

500 0.5 

400 0.5 

400 0.5 

1000 0.5 

300 0.5 

100 0.5 



<Pl P2 
1 



UJ 
1 
1 
1 

0.8 
0.8 
0.8 
0.8 
0.8 
0.8 
0.8 
0.8 



0.05 
0.05 
0.05 
0.2 
0.2 
0.2 
0.2 
0.2 
0.2 
0.2 
0.2 



A 

100% 
100% 
100% 
100% 

35% 
33% 
25% 
66% 
20% 
17% 
20% 



Table I. PSO settings for iV-Photon input state: velocity 
damping cj, swarm size S, number of PSO-Steps A, max step- 
size z^max, exploitation weight (pi, exploration weight p2, frac- 
tion A of PSO runs produced policies with variance depicted 
in Fig. 3. 



Particle i communicates with other particles in its 
neig hborhood A/" (i) C S. The neighborhood relations 
between particles are commonly represented as a graph, 
where each vertex corresponds to a particle in the swarm 
and each edge establishes a neighbor relationship between 
a pair of particles. This graph is commonly referred to 
as the swarm's population topology. 

We have adapted the common approach to set the 
neighborhood j\f^ of each particle in a pre-defined way 
regardless of the particles' position. For that purpose the 
particles are arranged in a ring topology. For particle z, 
all particles with a maximum distance of r on the ring 
are m 

The PSO algorithm updates the position of all particles 
in a round based manner as follows. At time step t 

1. Each particle i = 1, 2, . . . , S assesses the sharpness 



S q (pt^) of its current position p\ L) in the policy 
space (and updates pM if necessary). 

2. Each particle i communicates the sharpest policy 
pW it has found so far to all members of its neigh- 
borhood jV^ . 

3. Each particle i determines the sharpest policy 

= max p^ found so far by any one particle in 

its neighborhood Af^ (including itself). 

4. Each particle i changes its position according to 



Pt+i 



Pi 
uj 



Api 



Apf } = uj ( Apf\ + ip x • randO • (p« - 
+ <p 2 .rand() ■ (g^ - pf)) . 



(16) 



The parameter uj represents a damping factor that as- 
sists convergence, and randO is a function returning uni- 
formly distributed random numbers in [0,1]. The 'ex- 
ploitation weight' cpi parametrizes the attraction of a 
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N A$i, . . . , A$iv-i Acp Vp 

4 1.5701,0.7862,0.5043 0.3507 0.37621 

5 1.5722,0.7816,0.5293,0.3684 0.2739 0.27922 

6 1.5708,0.7830,0.5669,0.3881,0.2889 0.2306 0.21835 

7 1.5708,0.7854,0.6159,0.4130,0.3073,0.2421 0.1988 0.17630 

8 1.5708,0.7854,0.6663,0.4399,0.3264,0.2551,0.2080 0.1750 0.14561 

9 1.5708,0.7854,0.7079,0.4620,0.3440,0.2671,0.2164,0.1811 0.1554 0.12253 

10 1.5708, 0.7854, 0.7392, 0.4788, 0.3599, 0.2780, 0.2240, 0.1867, 0.1597 0.1393 0.10482 

11 1.5706, 0.7850, 0.7613, 0.4934, 0.3744, 0.2875, 0.2313, 0.1920, 0.1642, 0.1421 0.1260 0.09094 

12 1.5708, 0.7854, 0.7800, 0.5023, 0.3890, 0.2983, 0.2384, 0.1973, 0.1677, 0.1456, 0.1285 0.1149 0.07985 

13 1.5695, 0.7847, 0.7920, 0.5119, 0.4029, 0.3083, 0.2457, 0.2027, 0.1720, 0.1487, 0.1310, 0.1170 0.1054 0.07083 

14 1.5703, 0.7860, 0.8018, 0.5195, 0.4171, 0.3179, 0.2529, 0.2077, 0.1756, 0.1517, 0.1335, 0.1190, 0.107326 0.0975 0.06337 

Table II. Parameters for the best PSO-generated policies for N = 4, . . . , 14. 



particle to its best previous position jP\ and the 'ex- 
ploration weight' cp2 describes the attraction to the best 
position #W i n the neighborhood. To increase conver- 
gence, we bound each component of Ap^ by a maximum 
value of ^max- In sumary, the properties of the swarm, 
such as size and behavior, are defined by the following 
parameters. 

uo G [0, 1] velocity damping factor 
ifi G [0, 1] exploitation weight 
(f2 £ [0, 1] exploration weight 
5 population size 

(17) 

maximum step size 
max particles are allowed to move 

r interaction range of particles 

Clearly, the success and the number of required PSO 
steps to find the maximum is highly dependent on the 
values of these parameters. For instance, with increasing 
TV, a bigger population size is required to account for the 
raising dimensionality of the search space. The most suc- 
cessful settings we found are listed in Table I. For each 
N = 1,...,14, the best policy (A$i, . . . , A^-i, Ay) 
the PSO algorithm learned is given Table II. 



G. Noise resistance 

As with the BWB-policy, which works with an idealized 
noiseless model, the training of our PSO algorithm is 
performed based on the simulation of a noiseless Mach- 
Zehnder interferometer. However, Higgins et al. recently 
used the BWB-policy as a component for their exper- 
iment [11], which shows that the feedback policies we 
considered for optimization are robust against noise. 

Therefore, the policies generated by our learning ap- 
proach are applicable to moderately noisy experiments, 



even though the PSO algorithm was trained on a simu- 
lated noisefree experiment. 

Figure 3 shows the Holevo variance of the PSO- 
generated policies for different photon-loss rates. We 
have calculated the variance as follows. For a fixed 
loss-rate 77, the probability of detecting k of the TV 
input-photons is given by the binomial distribution 
B(k]N,rj) = (k)v {N ~ k) (l ~ V) k - Then the probability 
that the policy p produces an estimate cp p with error 
S = tp p - if is 

N 

P P (^) = J2 B ( k -> N >v)P(^,k). (18) 

k=0 

An analogous calculation to the one in appendix D yields 



S(P) 



N 



Y,B{k;N, V )y{k) 



k=0 



(19) 



with 



2tt 



y(k) 



_L £VMn*) f P p (n k \^) e"*£fc. (20) 




{0,l} fc 



The probability P p (rik\s) is given by equation (11). 

Figure 3 shows that our PSO-generated policies with 
the state (7) as input are remarkably robust against 
photon loss. Even with a loss rate of r] = 40% the vari- 
ance scales as N oc 7V _1,307±0 009 , and the measurement 
lies in the domain of quantum enhanced measurements. 

The strong robustness against photon loss is mainly 
due to the nature of the input state (11), which is highly 
entangled and symmetric with respect to qubit permuta- 
tions. As a consequence, this state remains entangled 
even if a high percentage of photons are lost. 



